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Introduction 


The process of exploring for oil and gas is rich in uncertainties, 

A '' 1 ■ ' ' ' 

Any attempt to forecast returns to investment in exploration must take 

them into account in a systematic way. By this we mean that inferences 
I ■ ' ■ ' 

about the important uncertain quantities characterizing the exploration 

process should be based on a mathematical model whose parameters may be 

estimated from observable data in a coherent.' way. At the root of any 

useful model of the exploration process, then, is a set of assumotions 

i : 

that delineate in clear unambiguous terms the probability law governing 
the manner in which observable data is generated. 

Our first objective is to construct a model of the exploration 
process that allows us to test empirically the hypothesis that at an 
early stage in the exploration of a basin, the process behaves like 
sampling without replacement. The model we posit is parsimonious — — 
based on a small number of assumptions and indexed by only five parameters. 

The set of assumptions on which it is built reflects at least two qualitative 
assertions often made by oilmen; the big ones'* tend to be found first and 
the size distribution of fields is highly skewed. We may use it to 
compute answers to two questions of paramount importance in designing 
exploration strategy; 

(1) How does the probability that a wildcat well will find a 
reservoir change (if at all*) as the history of a basin unfolds? 

(2) What is the probability that a yet- to-be-drilled wildcat well 
will find a reservoir of a given size or greater at a given 
point in the development of a basin? 
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Our second objective is to posit a reasonable model of the spatial 
distribution of petroleum reservoirs that conforms to a number of empirically 
observed facts about such distributions, but does not possess three 
unrealistic attributes that characterize models of spatial occurrence 
appearing in the literature: dependence of the model on arbitrary 

subdivision of a basin into units of subspace, the assumption of spatial 
homogeneity of the stochastic process operating within each such unit as 
well as across units, and conceptualization of a reservoir as a point- 
(in the plane) rather than as an object with positive area . (See Uhler 
and Bradley [1970], Allais [1957], Engel [1957].) 

The first model we pose differs significantly from those postulated 
by Arps and Roberts [1956], and by Kaufman [1963], It accounts for the 
impact of exploration technology on the probability of discovering a new 
reservoir in an explicit and intuitively meaningful way. And it is 
structured so that inferences about parameters not known with certainty 
may be made in accordance with well understood statistical principles. In 
particular, the assumption that the probability of discovering a reservoir 
is proportional to its size strongly biases any "usual" estimator when, 
the sample • size is small, so we develop methods for coping with this 
complicating feature of the data-generating process. 

Our spatial model has not yet been subjected to empirical 
validation. However, its structure is sufficiently flexible to warrant 
the conjecture that it. will in fact prove to be a reasonable characterization 
of a process that can by visual inspection be seen to be spatially inhomo- 
geneous; i,e, , fields tend to cluster rather than to be spread evenly 
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throughout, a basin. Under the direction of one of the authors, Golovin 
[1970] has programmed versions of this model and done computational 
exploration of some of its features. We shall draw heavily on his work 
in our discussion, - 
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Nomenclature 


I. A Model of the Discovery Process 


The technology available for identifying . potential oil-and/or- 
gas-bearing structures is not perfect. We shall assume that if this 
technology is applied to the entire areal extent of a generic basin it 
will delineate M distinguishable prospects. We label them 1,2,,.. ,M and 
call 7}\ = {l,2,. . , ,Mj the label sat for the population of prospects in 
this basin. Each prospect either is or isn't a field ; by "field" we mean 
a hydrocarbon-bearing reservoir or a collection of contiguous reservoirs. 


(Precision in defining what we mean by a "field" is not important at 
this juncture.) If it is a field, the field has many characteristics of 
interest; momentarily, we focus on only one — its areal extent . 

Let / 



1 if the i** 1 prospect is a field 
0 otherwise 


and define 


A^ = areal extent of 



prospect. 


Then (x^ ,A/) for is a characteristic of the i th population 

element. We do not know ^ = {(x^A^ j i£ftj} with certainty prior to 
beginning exploration of the basin. One of our objectives is to make 
inferences about e- M as prospects are delineated and fields discovered. 

In particular we wish to know which element^ of 8^ have x = 1, since the 
V prospect is by definition a field if and only if x. = 1, 

At the outset of exploration, the exploration process will 
generate only a small subset of potential prospects in the basin, say 
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n< M of them with population labels i^,..,,i n » And only a subset of 
k < n of these prospects will have been drilled. Hence a sample of 
size n is an ordered sequence of n of the population elements, 

(i 1} , , , ,ip) with i ^Zl7\ for 1= l,2,...,n, together with an ordered 
n-tuple of observed characteristics; e.g., 

, — ,i ); (x. ,A ), A , A , (x ,A ,A )]. 

There will be no loss in generality in the context of the model we deal 

with here if we relabel those prospects that have been drilled in the 

order in which they were drilled and re-order as follows* 

r /• . \ / (l) *(l)\ („( k) A^kX A A 

[(i,,,., a. (x , A \x , a a , >***'J 

1 * n 2 3. 

In fact our model will allow us to ignore the ordering of areas A i . of 

0 

prospects that have been generated at a given point in time but not 
drilled, so we define a sample as; 

'h* -Cd,,...,! )i (* (1) ,* (1) ) (* (k) , A (k) > ; UJ 1 

n,K x n j 

where it is understood that the element £A/£ is the set of areas of 

3 

undrilled prospects generated by the exploration process at the instant 
, k / 1> 

when the (k+l) 5 *' well is to be drilled; r is Y x v , the number of 

1=1 r 

fields found by the first k wells. We shall use H k as shorthand 
for a complete description of a sample when no ambiguity will arise. 

In order to describe the assumptions on which our model is 
based, we need the follovring array of notational ammunition;^ 


1, A summary of symbols is given at the end of the paper in Table 1, 
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“ fi | arid = 1^ , the label set of fields in the 

basin, 

= A * f the total, area of N fields in the basin. 

j r*T J- ' w 


i£I 


N 


M 


^ ~ Y-. the total area of M prospects in the basin, 


i=l 


x 1 for t=l,2 , ,k^ , the label set of successful 

wells among the first k wells drilled , 

*^k — ^t |x =0 for t=l,2 , , . , ,k} , the label set of un- 

successful wells among the first k wells drilled, 

Si = ]T the total 


teJ k 

first k wells, and 

k (t) 

u lr = L A * the total 
K t=l 

k wells. 


area of fields discovered by the 


area of prospects drilled by the first 


The Data-Generat.ing Model 

We shall assume that the process generating observable data 
has the following properties? 

1, Constant Technology 

Given and and conditional on observing a sample 
H n,k statistics s^ and u^, 


P (x^ k+1 ^= l| H r , ) = i . 

n,k ” u k 


l 


•This assumption says that the probability that the (k+l) s * WQ n ' 

discover a field changes in a "hypergeometric-like" fashion with change: 
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in and u^. The ratio S^/R^ does not depend on either or u^ and is 
a rough measure of technological efficiency, hence the label " constant 
technology", 

2. Probabilistic Proportionality 

Given £A_^J UTri and x^= l} and conditional on 
observing H r . and x^ + ^= 1, the probability that the (k+l) s ^ well 

tlylC 

discovers a field of areal extent A is 


p I** 1 )*, a* K >- 


nr lf Ae ^ A il v 1 and 

N k 


0 otherwise. 


Assumptions 1 and 2 formalize the idea that the probability 
of discovering a field of areal extent A is proportional to A, for given 


and S N » 




hi 


it A£ | z ± =l and if 


0 otherwise. 


Both assumptions ignore the information content of the statistic [A^ J , 

3 

the set of areas A. of prospects generated prior to drilling the 
st J 

(k+l) well but as yet undrilled, and exploit only the information generated 

by the outcome of drilling the first k wells. In order to expMt all 

information in H , , we would have to build a model of the process 

generating prospects as well as of one generating discoveries. We have 

chosen to suppress this complicating feature in our preliminary investigation, 

v 

3 . Probability Law of f A^ j i£ljj| 

r iO 

£A^ | ii is a set of mutually independent identically 
distributed random variables, each characterized by a density f(*|£) 
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Likelihood Function 


The likelihood function generated by observation of a sample 

* v 

H , is , defining u = 0 and s = 0 , 
n,k o o 


L (N, Rjj, fc S N H^ >k ) 


k S „ - S . S H - S . - (t) 

o TT (r ! — -r^f (l - [ b - "" --- - -]) 1 


w. *m ' V: 


■®B - u 


t-1 


TT 


x M r 

t £ J, L S„ - s 


N 


•] fU (t) |e) 


t-1 


(l.D 


x f* N - r (s N - s k |e) 


where f * ^“ r is the (N-r)-fold convolution of f with itself. The 
appearance of the term f* ^” r (S^- s^) may be explained like this; 
the process of generating observations does so in two stages. First, 
nature generates N values fA^j i£I^j . Then the observables are 


generated in a way that depends probabilistically on 3^ = Y. 


i£I 


A. 

l 


N 

Consequently, Sjj is a parameter of the observational process (1 and 2) 
and at the same time a statistic from the vantage point of the process 
generating field areas (3). If we wish to make inferences about 
N, R^, €> , and jointly, then appears in both roles. 

The likelihood function (1,1) may be rewritten as proportional 

to: 
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7 r_ (1 - 

te7 k 


x 



t-1 


u 


t-1 



XT 

tej k 


ik m - “t-i 5 


f(A^ | 6) 



(1.2) 


Approximation of Likelihood Function 

In general, working directly with L(N,^,e,S N | fc ) is 
difficult. However, when N-r is very large we can apply the (equal 
components) Central Limit Theorem; i.e., if f has mean m£(-o>, +co) 
and bounded variance j/, then as N-r increases, f* N ” r becomes more and 
more- accurately approximated at each value of its domain by a Normal 
density ’ j m[N-r] , J/[N-r]) with mean m[N-r] and variance £/[N-r],^ 

Here we are interested in the behavior of L when f is a Lognormal 
density with parameter 6; = (u»a ): 


f(x | e).= f L (x | n,a) = \ 


r i " 2^ lo S e x-u) 2 /c 2 

r 7= 6 i if X >o, 

2na (!.3) 


otherwise. 


Combining the Normal approximation suggested above with f as in (1.3), 
that portion of (1.1) involving p, and a may be written as proportional to 


2 , Provided 


/ 1 


f(f e)| 2 d£< oo 
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1 . , , 2 / 2 1 , 2 
“ r o“ p^vB-Mr) h - o v / a 


O Q 2 




(1.4) 


where 


r 1 2 i 

m = exp ^p, + -z cj ) , 
B=r L logA i » 


\J - m 2 [exp {a 2 ]- 1] , 

v = Y. (log A ± ) 2 - rg 2 . 


Maximum Likelihood Estimation 

It will be convenient to work with m in place of p, in the sequel 

and we shall do so. To find a maximum likelihood estimator (MLE) of 
2 

parameters m, a , N, R^, and when the likelihood function is of the 
form (1,1) is analytically difficult. We employ the following procedure;-^ 

1. Fix the value of R^, 

2. Find an MLB m* (a,N,S^) of m for fixed a, N, and , 

3. Holding N and fixed, substitute m*(a,N,S N ) for m in (1.4); 

2 2 

find MLE»s of m and o by searching (1.4) over o' £ (0,oo ), 

' Call this pair [m^(N,S N ),a^(N,S N )]. 


3, In practice we have utilized the gradient method developed by 
Goldfeld, et al .f 19661 to simultaneously estimate u (or m) and a 
conditional upon the pair (N,S^). This creates the tableau described 
in step 4, It may prove possible to employ this method to simultaneously 
estimate all parameter values, thus eliminating ''the search procedure of 
steps 4-?. Using data on exploratory drilling in Alberta, we have esti- 
mated parameters for several regions. The data support ths hypothesis 
that the sizes of discoveries tend to decrease over time, but although 
the estimates appear reasonable we regard them as too tentative to be 
published at this time. 
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4. Repeat step 3 for a large set of values of the ordered pair 
(NjS^), and tabulate the value of log likelihood for each 
(N,S N ) at [m,a 2 ] = [m^(N,S N ), a|(N,3 N )]. 

5. Search tabulated values of the log likelihood for an approximate 
maximizer (N*S*,m^ (N*,S*) ,rj 2 (N*,S*)) of (l.l), given R^. 

6. Repeat steps 2 through 5 for a set of values of R^. 

7. Search log likelihood values for an approximate (joint) MLE 
of all parameters. 



II. 


A Spatial Model 


By a spatial model of the deposition of petroleum deposits, we 
mean a stochastic process generating values of a sequence of random 
variables in a way that jointly simulates the frequency distribution 
of areal extent, the geographic location and the shape of these deposits. 
The first approaches that pop into one’s mind are incorrect; i.e,, vieihLrig 
the process generating the number of fields per unit area A as a spatially 
homogeneous Poisson process is incorrect; randomizing the parameter \(A) 
of such a process by assuming that X(A) is a random variable with Gamma 
density (see Uhler and Bradley [1970]) leads to a better approximation, 
but still is deficient in the tails — that is, a negative binomial 
distribution doesn't fit well in the right tail. In addition, a compound 
Poisson process, or a (randomized) modification of it doesn't really 
explain the "clustering close together" that one observes when examining 
a map pinpointing oil and gas fieLds, already discovered in a well-explored 
basin. 

The modal we propose here is conceptually simple, extremely 
flexible and can be easily modified in many ways. We replace the two 
dimensional continuum with the lattice L = {(i,j)|i,j integerj of 
ordered pairs of integers and equip it with the simplest of probabilistic 
laws of motion i a symmetric random walk. We then define an imbedded 
process that lays down a 1 or a 0 at first -(or subsequent) passage of 
the random walk through a lattice point. The assumptions we detail 
shortly lead to pictures such as that shown in Figure 1 (Golovin [1970], 




Figure 1 
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Distinguishing features of the model used to generate Figure 1 
are that reservoirs have positive area, there is a cluster effect, and 
the frequency histogram of area extents is, aside from a truncation 

effect induced by clustering, asymptotically Lognormal. 

I 

Basic Definitions and Properties 

The model is composed of three basic objects: a symmetric 

random walk on L, a random process superposed on the path taken by the 
random walk, and a stopping rule, 

Let. [i,j; t] denote the. position oh L of the random walk at 
trial t, t = 0,1,2,... and define 

if (i» j) has been assigned a one at 
some t* < t, 

■ I . ■ ' 

if (i,j) has been assigned a zero at 
some t* < t. 

If the random walk has not passed through (i,j) at some t* < t, 6(i,j) 
is left undefined. We set ? 

I(t) = {(i,j)|5(i,j) = 1 at trial t] 

and 

• J(t) = {(i,j) J6(i,j) = 0 at trial t} , 

and define the state of the process at trial t as a triplet consisting 
of the location [i,j; t] of the random walk at the end of trial t, the 
set I(t), and the set J(t); i.e., S t = ([i,j; t], I(t), J(t)). Let h t 
be the smallest non-negative integer such that [i,j; t+h.] £ I(t)(JJ(t); t+h 
is the first trial following trial t at which first passage through an 


6(i»j) = { 
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k 

an unassigned point occurs. Set t Q = 0, t v = £ h. , k > 1, and define 

r ^ i=l 1 

°n tQ*t^,. . , jt^,. , , a sequence k = l,2,...j of mutually independent. 

random variables with common probability function 


W 


1 - H if j = 0; 

P Ut k -^ * I ^“)P r Cl-P)"- r if 3 = 2 r , r . 0,1,2,... ,»• 

0 otherwise; 

with m a positive integer and 0<p<i. The value f. of £ may be 

V \ 

interpreted as a "chain 11 of ones that the process will attempt to lay 
down on points in the complement of I(t) in L. Upon termination of the 
assignment of ones that begins at [i,j; t^], the random walk continues 
with no assignments made until at the (random) trial t^* t k + ■ a 

lattice point [i,j; tj^] g I(t k )Uj(t k ). A value f of £ 

VI Vu 

is generated, and the assigament of ones begins anew as described above. 

Assignment of ones is governed by the following rules, where we 

let N([i,j, t^])— £(i+x, j+y) I x= + l»y = + l^-, the set of nearest neighbors t< 
[i»j; t^] in L. 

1. If no element of N([i,j; t^]) is in I( t^) , set 5([i,j; t^]) = 1, 

2. If at least one element of N([i,,j; t^]) is in I(t^), set 

®CCi»j; t^]) - 0 and terminate the assignment of ones (from 

the "chain" of L ones). 

\ 

3. If *([i,j; t^]) = 1, let the random walk continue, repeating 
step 1 until either; 

; (a) ^ on8s have been ^signed to [i^; t k ] ,. . . ,[i, t- k + £ 

or 
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(b) a position [i,j; t k +jt],jl< 

at least one element of N([i, 

Then terminate the assignment of ones from the ••chain” of f , ones. 

* Tc 

Clearly, the random time h^ = t^ + ^ - t^ depends upon the 

state of the process at trial t^,* 1 And the number of ones assigned 

to lattice points from the '’chain” of ones depends in a very 

nt \ , 

, where JL is the first integer 

k+vfi/ 

such that ([i,j; = 0. In probabilistic parlance, the rules for 

generating a value of and for the assignment of ones to lattice 
points are called stopping rules . 


complicated way on S , S 

Tc Vl 1 


t. , is reached for v;hich 
j; +JI]) is in I( ^ . 


4, There is no semantic confusion in using "time" h^^ to denote number 
of trials between tj^- t^ and we shall do so. 
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Table 1 

Summary List of Symbols 


I, Model of the Discovery Process 


k 

M 

N 

r 

*M 

s k 

S N 

e 

V 

X, 


it 

surface area of x reservoir 

number of wildcats drilled, i,e,, number of prospects observed 

number of prospective drilling sites 

number of reservoirs in the basin 

number of successful wildcats 

total area of M prospects in the basin 

total (emulative) area of reservoirs discovered by k wildcats 
total area of N reservoirs in the basin 
parameter set for the density function of & = (u,cr 2 ) 
total (cumulative) area of prospects drilled by k wildcats 

j th 

0 otherwise 


outcome of i wildcat well (x,= 1, where well is a success, 
ise) 1 


II, Spatial Model 

6(i$j) state of point .(i»j)j 0 or 1, where 1 signifies presence of 
petroleum 


Kt) 

J(t) 

L 
N • 


£ 


*k 


petroleum areas: set of 1- points, l(t) = [(i,j)| 6(i, j) = 1] 

nonpetroleum areas (or unassigned): set of 0- points, 

J(t) = [(i,j)|6(i,j) = 0] 

spatial location: lattice of ordered pairs, L = [(i,j)|i,j integer] 

set of nearest neighbor points to point (i,i: t, ): 

N[(i,j; \)]s [(i+x, j+y)|x = + 1,‘y = : + l] ^ 

chain of ones laid down from point (i,j; t. ) subject to 
prescribed stopping rules * 


3 t state of the P rocess at trial t: S t = [(i,j; t), I(t), J(t)] 
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